Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Ocropodium: open source OCR for small-scale historical archives

Identifieur interne : 000305 ( Main/Exploration ); précédent : 000304; suivant : 000306

Ocropodium: open source OCR for small-scale historical archives

Auteurs : Tobias Blanke [Royaume-Uni] ; Michael Bryant [Royaume-Uni] ; Mark Hedges [Royaume-Uni]

Source :

RBID : Pascal:13-0290838

Abstract

Large-scale digitization projects dealing with text-based historical material face challenges that are not well catered for by commercial software. This article discusses the results of a project to build a scalable OCR workflow for historical collections based on open source tools that is particularly tailored towards use in small-scale historical archives. It argues that open source tools allow for better customization to match these requirements, particularly with regard to character model training and per-project language modelling. We offer insights into our accuracy evaluation results of various open source OCR tools, as well as a case study about the challenges and opportunities of open source OCR in historical archives.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Ocropodium: open source OCR for small-scale historical archives</title>
<author>
<name sortKey="Blanke, Tobias" sort="Blanke, Tobias" uniqKey="Blanke T" first="Tobias" last="Blanke">Tobias Blanke</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>King's College London</s1>
<s3>GBR</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>Royaume-Uni</country>
<wicri:noRegion>King's College London</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Bryant, Michael" sort="Bryant, Michael" uniqKey="Bryant M" first="Michael" last="Bryant">Michael Bryant</name>
<affiliation wicri:level="1">
<inist:fA14 i1="02">
<s1>King's College London</s1>
<s3>GBR</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Royaume-Uni</country>
<wicri:noRegion>King's College London</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Hedges, Mark" sort="Hedges, Mark" uniqKey="Hedges M" first="Mark" last="Hedges">Mark Hedges</name>
<affiliation wicri:level="1">
<inist:fA14 i1="03">
<s1>King's College London</s1>
<s3>GBR</s3>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Royaume-Uni</country>
<wicri:noRegion>King's College London</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">13-0290838</idno>
<date when="2012">2012</date>
<idno type="stanalyst">PASCAL 13-0290838 INIST</idno>
<idno type="RBID">Pascal:13-0290838</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000048</idno>
<idno type="stanalyst">FRANCIS 13-0290838 INIST</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000073</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000720</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000067</idno>
<idno type="wicri:doubleKey">0165-5515:2012:Blanke T:ocropodium:open:source</idno>
<idno type="wicri:Area/Main/Merge">000308</idno>
<idno type="wicri:Area/Main/Curation">000305</idno>
<idno type="wicri:Area/Main/Exploration">000305</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Ocropodium: open source OCR for small-scale historical archives</title>
<author>
<name sortKey="Blanke, Tobias" sort="Blanke, Tobias" uniqKey="Blanke T" first="Tobias" last="Blanke">Tobias Blanke</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>King's College London</s1>
<s3>GBR</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>Royaume-Uni</country>
<wicri:noRegion>King's College London</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Bryant, Michael" sort="Bryant, Michael" uniqKey="Bryant M" first="Michael" last="Bryant">Michael Bryant</name>
<affiliation wicri:level="1">
<inist:fA14 i1="02">
<s1>King's College London</s1>
<s3>GBR</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Royaume-Uni</country>
<wicri:noRegion>King's College London</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Hedges, Mark" sort="Hedges, Mark" uniqKey="Hedges M" first="Mark" last="Hedges">Mark Hedges</name>
<affiliation wicri:level="1">
<inist:fA14 i1="03">
<s1>King's College London</s1>
<s3>GBR</s3>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Royaume-Uni</country>
<wicri:noRegion>King's College London</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Journal of information science</title>
<title level="j" type="abbreviated">J. inf. sci.</title>
<idno type="ISSN">0165-5515</idno>
<imprint>
<date when="2012">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Journal of information science</title>
<title level="j" type="abbreviated">J. inf. sci.</title>
<idno type="ISSN">0165-5515</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Large-scale digitization projects dealing with text-based historical material face challenges that are not well catered for by commercial software. This article discusses the results of a project to build a scalable OCR workflow for historical collections based on open source tools that is particularly tailored towards use in small-scale historical archives. It argues that open source tools allow for better customization to match these requirements, particularly with regard to character model training and per-project language modelling. We offer insights into our accuracy evaluation results of various open source OCR tools, as well as a case study about the challenges and opportunities of open source OCR in historical archives.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Royaume-Uni</li>
</country>
</list>
<tree>
<country name="Royaume-Uni">
<noRegion>
<name sortKey="Blanke, Tobias" sort="Blanke, Tobias" uniqKey="Blanke T" first="Tobias" last="Blanke">Tobias Blanke</name>
</noRegion>
<name sortKey="Bryant, Michael" sort="Bryant, Michael" uniqKey="Bryant M" first="Michael" last="Bryant">Michael Bryant</name>
<name sortKey="Hedges, Mark" sort="Hedges, Mark" uniqKey="Hedges M" first="Mark" last="Hedges">Mark Hedges</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000305 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000305 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:13-0290838
   |texte=   Ocropodium: open source OCR for small-scale historical archives
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024